Class Web Page

R resource page


1. Trend with Seasonality


Trend with Seasonal period \(s\)

  • Suppose \(Y_t\) is the observation. The Model says: \[ Y_t \hspace3mm = \hspace3mm m_t + S_t + X_t \] \[ \begin{align} m_t &: \mbox{ Trend Component }\\ S_t &: \mbox{ Seasonal Component }\\ X_t &: \mbox{ Stationary Time Series } \end{align} \] where \(EX_t=0\).

  • Condition on Seasonal component: \(S_{t+s}=S_t\) and \(\sum_{j=1}^s S_j = 0\).



__Ex: Accident Data

Accidental deaths in USA., 1973 to 1978 from Brockwell

##        Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov
## 1973  9007  8106  8928  9137 10017 10826 11317 10744  9713  9938  9161
## 1974  7750  6981  8038  8422  8714  9512 10120  9823  8743  9129  8710
## 1975  8162  7306  8124  7870  9387  9556 10093  9620  8285  8433  8160
## 1976  7717  7461  7776  7925  8634  8945 10078  9179  8037  8488  7874
## 1977  7792  6957  7726  8106  8890  9299 10625  9302  8314  8850  8265
## 1978  7836  6892  7791  8129  9115  9434 10484  9827  9110  9070  8633
##        Dec
## 1973  8927
## 1974  8680
## 1975  8034
## 1976  8647
## 1977  8796
## 1978  9240



__Ex: Oil Filter Sales Data

Oil Filter Sales Data (Cryer p7) inside TSA package.

## [1] TRUE
##       Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
## 1983                               2385 3302 3958 3302 2441 3107
## 1984 5862 4536 4625 4492 4486 4005 3744 2546 1954 2285 1778 3222
## 1985 5472 5310 1965 3791 3622 3726 3370 2535 1572 2146 2249 1721
## 1986 5357 5811 2436 4608 2871 3349 2909 2324 1603 2148 2245 1586
## 1987 5332 5787 2886 5475 3843 2537
## [1] 12



Seasonality

  • \(s\) is the seasonality frequency. (e.g. \(s=12\) for monthly seasonality).

  • Seasonality repeats every \(s\) observation. \[ S_{t+s}=S_t. \]

  • Seasonality is not a trend. It’s a temporary deviation from overall trend.
    \[ \sum_{j=1}^s S_j = 0. \]




2. Removing Seasonality

There are three poplular ways of removing seasonality:

  • MA filter

  • Seasonal Average

  • Seaonal Differencing



Method 1: MA filter

  • If \(s\) is odd, let it be \(2q+1\). Then use linear MA filter. \[ \hat m_t \hspace3mm = \hspace3mm \frac{1}{(2q+1)} \sum_{i=-q}^q X_{t-j} \] If \(s\) is even, let it be \(2q\). Then use \[ \hat m_t \hspace3mm = \hspace3mm \frac{1}{2q} \Big(.5 X_{t-q} + \sum_{i=-q-1}^{q-1} X_{t-j} + .5 X_{t+q}\Big) \]

  • Because seasonality should sum up to zero for each seasonal cycle, we have estimated trend: \[ \hat m_t \hspace3mm = \hspace3mm \frac{1}{(2q+1)} \sum_{i=-q}^q m_{t-j} + \underbrace{ \frac{1}{(2q+1)} \sum_{i=-q}^q S_{t-j} }_{0} + \underbrace{ \frac{1}{(2q+1)} \sum_{i=-q}^q Y_{t-j} }_{\mbox{ small }}. \] Then estimate for (\(S_t\) + \(Y_t\)) is \[ w_k \hspace3mm = \hspace3mm X_{t} - \hat m_{t} \hspace10mm q < t < n-q. \]

  • Use this to estimate the seasonal part, \[ \hat S_k \hspace3mm = \hspace3mm w_k - \frac{1}{s} \sum_{i=1}^s w_i. \] Now we have deseasonalized data (\(m_t\) + \(Y_t\)) \[ d_t \hspace3mm = \hspace3mm X_t - \hat S_t \hspace10mm t=1 \cdots n \] Now go back and re-estimate trend using \(d_t\).

Filter reveals quadtaric trend




Method 2: Seasonal Average

  • Suppose we have monthly seasonality, \(s=12\).
    Then take average for each month. For example, average for January will be \[ \bar S_1 \hspace3mm = \hspace3mm \sum_{j=0} X_{1+12j} \] and July average will be \[ \bar S_7 \hspace3mm = \hspace3mm \sum_{j=0} X_{7+12j}. \] Note that this seasonal average will take out the trend part as well.



__Ex: Accidents Data

## Warning in kpss.test(A): p-value smaller than printed p-value
##        KPSS   ADF    PP
## p-val: 0.01 0.885 0.397

Stationary?


Fit deseasonalized series with ARIMA

## Series: Ds.Acci 
## ARIMA(0,1,1) 
## 
## Coefficients:
##           ma1
##       -0.4056
## s.e.   0.1082
## 
## sigma^2 estimated as 66457:  log likelihood=-494.53
## AIC=993.07   AICc=993.24   BIC=997.59

##   B-L test H0: the sereis is uncorrelated
##   M-L test H0: the square of the sereis is uncorrelated
##   J-B test H0: the sereis came from Normal distribution
##   SD         : Standard Deviation of the series
##       BL15 BL20  BL25  ML15  ML20    JB      SD
## [1,] 0.755 0.53 0.351 0.557 0.632 0.768 255.614
## Series: Ds.Acci 
## ARIMA(2,0,0) with zero mean 
## 
## Coefficients:
##          ar1     ar2
##       0.6039  0.2916
## s.e.  0.1134  0.1156
## 
## sigma^2 estimated as 67065:  log likelihood=-501.97
## AIC=1009.94   AICc=1010.29   BIC=1016.77

##   B-L test H0: the sereis is uncorrelated
##   M-L test H0: the square of the sereis is uncorrelated
##   J-B test H0: the sereis came from Normal distribution
##   SD         : Standard Deviation of the series
##       BL15  BL20 BL25  ML15  ML20    JB      SD
## [1,] 0.438 0.192 0.13 0.789 0.697 0.863 257.122



ARIMA with Monthly Average model

After we subtracted the seasonality (Monthly average \(S_t\)), We either have ARIMA(0,1,1) or ARIMA(2,0,0).
\[ \begin{align} Y_t &= \hspace3mm S_t + X_t \\\\ X_t &= \hspace3mm \phi_1 X_{t-1} + \phi_2 X_{t-2} + e_t \hspace10mm \mbox{(if ARIMA(2,0,0))} \end{align} \]




Method 3: Seasonal Differencing (Box-Jenkins Method)

  • For guessed seasonality \(s\), define \[ \bigtriangledown_s = (1-B^s) \] (Don’t confuse this with \(\bigtriangledown^s\)). Then we have \[ \bigtriangledown_s X_t \hspace3mm = \hspace3mm (1-B^s) X_t \hspace3mm = \hspace3mm X_t - X_{t-s} \\\\ \hspace3mm = \hspace3mm m_t - m_{t-s} + \underbrace{ S_t - S_{t-s} }_{0} + Y_t - Y_{t-s}. \] since \(S_t=S_{t-s}\), this eliminates the seasonality. Now fit ARIMA to \(\bigtriangledown_s X_t\).



__Ex: Accidents Data

## Warning in adf.test(A): p-value smaller than printed p-value
## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value greater than printed p-value
##        KPSS  ADF   PP
## p-val:  0.1 0.01 0.01
## Series: diff(diff(Acci, 12)) 
## ARIMA(0,0,1)(0,0,1)[12] with non-zero mean 
## 
## Coefficients:
##           ma1     sma1     mean
##       -0.4963  -0.6146  21.0220
## s.e.   0.1351   0.1932  11.9921
## 
## sigma^2 estimated as 98283:  log likelihood=-424.27
## AIC=856.53   AICc=857.27   BIC=864.84

##   B-L test H0: the sereis is uncorrelated
##   M-L test H0: the square of the sereis is uncorrelated
##   J-B test H0: the sereis came from Normal distribution
##   SD         : Standard Deviation of the series
##       BL15  BL20  BL25  ML15  ML20    JB      SD
## [1,] 0.588 0.457 0.453 0.385 0.497 0.355 307.622



Seasonal ARIMA model

We took \(\bigtriangledown_{12}\), then \(\bigtriangledown\), then fit MA(1). Therefore our model is \[ \bigtriangledown \bigtriangledown_{12} Y_t \hspace3mm = \hspace3mm X_t \\ X_t \hspace3mm = \hspace3mm e_t + \theta_1 e_{t-1} + \theta_{12} e_{t-12} \] This is called \[ \mbox{ sARIMA}(p,d,q) (P,D,Q)_{12} \hspace3mm \mbox{ model } \\ \\ \mbox{ sARIMA}(0,1,1) (0,1,0)_{12} \hspace3mm \mbox{ model } \]




3. Ex: Oil Filter Sales Data



M2: Monthly Average

## Warning in pp.test(A): p-value smaller than printed p-value
##         KPSS   ADF   PP
## p-val: 0.015 0.039 0.01
## Series: Ds.Oil 
## ARIMA(0,1,1) 
## 
## Coefficients:
##           ma1
##       -0.8626
## s.e.   0.0696
## 
## sigma^2 estimated as 378095:  log likelihood=-368.67
## AIC=741.35   AICc=741.62   BIC=745.05
## Series: Ds.Oil 
## ARIMA(0,0,0) with zero mean 
## 
## sigma^2 estimated as 379437:  log likelihood=-376.42
## AIC=754.85   AICc=754.93   BIC=756.72

##   B-L test H0: the sereis is uncorrelated
##   M-L test H0: the square of the sereis is uncorrelated
##   J-B test H0: the sereis came from Normal distribution
##   SD         : Standard Deviation of the series
##       BL15  BL20  BL25  ML15  ML20    JB      SD
## [1,] 0.518 0.525 0.241 0.818 0.804 0.173 622.503



ARIMA with Monthly Average model

After we subtracted the seasonality (Monthly average \(S_t\)), it was WN. \[ \begin{align} Y_t &= \hspace3mm S_t + X_t \\\\ X_t &= \hspace3mm e_t \end{align} \]



M3: Sesonal Difference

## Warning in adf.test(A): p-value smaller than printed p-value
## Warning in pp.test(A): p-value smaller than printed p-value
##         KPSS  ADF   PP
## p-val: 0.075 0.01 0.01
## Series: diff(Oil, 12) 
## ARIMA(0,0,0) with non-zero mean 
## 
## Coefficients:
##            mean
##       -217.3889
## s.e.   132.7486
## 
## sigma^2 estimated as 652522:  log likelihood=-291.57
## AIC=587.14   AICc=587.5   BIC=590.31

##   B-L test H0: the sereis is uncorrelated
##   M-L test H0: the square of the sereis is uncorrelated
##   J-B test H0: the sereis came from Normal distribution
##   SD         : Standard Deviation of the series
##      BL15  BL20  BL25  ML15  ML20    JB      SD
## [1,] 0.71 0.645 0.502 0.967 0.993 0.055 807.789

We took \(\bigtriangledown_{12}\), then it looks like a WN. Therefore our model is \[ \bigtriangledown_{12} X_t \hspace3mm = \hspace3mm X_t \\ \\ X_t \hspace3mm = \hspace3mm e_t \] This is called \[ \mbox{ sARIMA}(p,d,q) (P,D,Q)_{12} \hspace3mm \mbox{ model } \\ \\ \mbox{ sARIMA}(0,0,0) (0,1,0)_{12} \hspace3mm \mbox{ model } \]




R-code Only

R code used in this lecture

# 1. Trend with Seasonality 

### __Ex: Accident Data

  acf1 <- acf; library(TSA); acf <- acf1  #- Load TSA package
  D  <- read.csv("http://gozips.uakron.edu/~nmimoto/pages/datasets/acci.csv", header=T)
  Acci <- ts(D, start=c(1973,1), freq=12)  #- Turn D into ts object with frequency 
    
  plot(Acci, type='o', ylab="num of accidents")
  Acci
    
  #--- plot with month ---
  plot(Acci, type="l", ylab="num of accidents")
  points(y=Acci, x=time(Acci), pch=as.vector(season(Acci)))  


### __Ex: Oil Filter Sales Data

    data(oilfilters)     # load data inside TSA package
    Oil <- oilfilters
    
    is.ts(Oil)     # is data in TS format?
    Oil            # look inside
    frequency(Oil) # what is the frequency?

    #--- plot with month ---
    plot(Oil, type="l",ylab="Sales")
    points(y=Oil, x=time(Oil), pch=as.vector(season(Oil)))



# 2. Removing Seasonality


### __Ex: Accidents Data 

  source("http://gozips.uakron.edu/~nmimoto/477/TS_R-90.txt") 
  plot(Acci, type="o")
    
  #--- Take Monthly Averages
    
  Mav1  <- aggregate(c(Acci), list(month=cycle(Acci)), mean)$x       #- 1yr long Mtly Av 
  M.av1 <- ts(Mav1[cycle(Acci)], start=start(Acci), freq=frequency(Acci))    #- Mtly Av 
  Ds.Acci <- Acci-M.av1                                        #- Subtract from original
    
  plot(M.av1,   type="o")  
  plot(Ds.Acci, type="o")
    
  
  plot(Acci,    type="o")
  plot(M.av1,   type="o")
  plot(Ds.Acci, type="o"); abline(h=0)

  Stationarity.tests(Ds.Acci)



### __Ex: Oil Filter Sales 

  plot(Oil, type="o")
    
  #--- Take Monthly Averages
  Mav2  <- aggregate(c(Oil), list(month=cycle(Oil)), mean)$x           #- 1yr long Mtly Av $
  M.av2 <- ts(Mav2[cycle(Oil)], start=start(Oil), freq=frequency(Oil))  #- Mtly Av as long as D2
  Ds.Oil <- Oil-M.av2                                                  #- Subtract from original
    
  plot(M.av2,  type="o")
  plot(Ds.Oil, type="o")
    
  plot(Oil, type="o")
  plot(M.av2, type="o")
  plot(Ds.Oil, type="o"); abline(h=0)

  Stationarity.tests(Ds.Oil)


  library(forecast)
  source("http://gozips.uakron.edu/~nmimoto/477/TS_R-90.txt")  

  auto.arima(Ds.Acci)
  auto.arima(Ds.Oil)

  Fit1 <- auto.arima(Ds.Acci, d=0)
  Fit1

  Randomness.tests(Fit1$resid)

  Fit2 <- auto.arima(Ds.Oil, d=0)
  Fit2

  Randomness.tests(Fit2$resid)  


### __Ex: Accidents Data
    
  plot(Acci, type='o', ylab="num of accidents")
  plot(diff(Acci, 12))                # take seasonal difference (Del_12)    
  plot(diff(diff(Acci, 12)))          # take (Del)(Del_12)

  Stationarity.tests(diff(diff(Acci, 12)))

  auto.arima(diff(diff(Acci, 12)))

  Fit3 <- Arima(diff(diff(Acci, 12)), order=c(0,0,1) )

  Randomness.tests(Fit3$resid)      



## __Ex: Oil Filter Sales Data


  plot(diff(Oil, 12))                # take seasonal difference (Del_12)    

  Stationarity.tests(diff(Oil, 12))
  Fit4 <- auto.arima(diff(Oil, 12))
  Fit4

  Randomness.tests(Fit4$resid)